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Abstract: This paper develops a test for homogeneity in finite mixture mod- 
els where the mixing proportions are known a priori (taken to be 0.5) and 
a common nuisance parameter is present. Statistical tests based on the no- 
tion of Projected Likelihood Contrasts (PLC) are considered. The PLC is a 
slight modification of the usual likelihood ratio statistic or the Wilk's A and is 
similar in spirit to the Rao's score test. Theoretical investigations have been 
carried out to understand the large sample statistical properties of these tests. 
Simulation studies have been carried out to understand the behavior of the 
null distribution of the PLC statistic in the case of Gaussian mixtures with 
unknown means (common variance as nuisance parameter) and unknown vari- 
ances (common mean as nuisance parameter). The results are in conformity 
with the theoretical results obtained. Power functions of these tests have been 
evaluated based on simulations from Gaussian mixtures. 

1. Introduction 

Finite mixture models are often used to understand whether the data comes from 
a heterogeneous or a homogeneous population. In particular, consider the case of 
a mixture of two populations with the mixing proportions known (GofSnet et al. 
[7]). We are interested to know whether the data is sampled from a proper mixture 
of two distributions or a single distribution. 

In particular, consider a mixture family g, with generating population densities 
given by A^o = {/('l^j '?) • E Q, rj E £}, where 6 is the main parameter of interest 
and 77 is the common nuisance parameter. We assume that the mixing proportion 
is known a priori to be 0.5. The mixture model then becomes 

(1.1) giz\ei,02.v) = 0.5 f{z\ei,v) + 0-5 .f{z\02,v)- 

The null hypothesis for homogeneity is, 9i =6*2. 

In several practical examples (for example, arising in speech analysis and non- 
parametric regression methodology) detection of the location of discontinuity in 
the local mean or the local variance (or local amplitude) are of interest (Figure 1). 
The theoretical results developed in this paper can be used in such problems. Fig- 
ure 1 demonstrates several scenarios of signals being scanned through a running 
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Fig 1. Left column shows time plots of data with solid vertical lines marking the windows con- 
sidered. The top two panels indicate a simulated noisy signal (with additive Gaussian noise) with 
mean function having a jump discontinuity. The bottom panels describe a portion of digitized 
speech waveform. In the right column three fitted densities of y-values: nonparametric kernel 
smoothed density (solid line), single component Gaussian fit (dashed line) and mixture of two 
Gaussian fit with equal mixing weights ( curve indicated by + ), are shown corresponding to the 
frames indicated in the left column. 



window of specified bandwidth. When the center of the window is placed at points 
of discontinuity the raw signal values (y-axis) will have a distribution which can 
be adequately modeled by (1.1). This basic idea has been explored by Hall and 
Titterington [8] in the context of edge and peak preserving smoothers. 

A brief list of references dealing with the study of mixture distributions and 
properties of the Likelihood Ratio Test (LRT) tests are provided below. In Tit- 
terington et al. [13], McLachlan and Basford [11] and Lindsay [10] one may find 
extensive discussions about the background of finite mixture models. The asymp- 
totic distributions of the LRT in mixture models have been studied in Bickel and 
Chcrnoff [1], Chernoff and Lander [•'j], Ghosh and Sen [6], Lemdani and Pons [')]. 
Different modifications of LRT tests in mixture models are proposed and studied 
by Chen et al. [4] and Self and Liang [12]. 

In this paper we introduce a concept of Projected Likelihood Contrasts (PLC), 
a modified version of the LRT test or the Wilks' A (Wilks [14]) statistic, which we 
motivate as follows. Consider i.i.d. observations Zi, Z2, ■ ■ ■ , Zm generated by some 
element of the class of densities g given by (1.1). The likelihood under the full 
mixture model is given by 



(1.2) 



N 

LN{eu92,v)=J2^ogg{Z,\ei,92,v), 
1=1 



where g is defined through (1.1). Under the null hypothesis the likelihood reduces 
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to the usual likelihood under Mq, namely, 

N 

(1.3) LN{e,0,rj)=Y,^ogfiZ,\e,Tj). 

i=l 

Define {0,fi) as the maximum likelihood estimators of (6*, 77) under (1.3). The idea 
behind the PLC statistics is to plug in the estimated nuisance parameter under the 
null in (1.2) and maximize it over remaining parameters 9i and 02- Finally the PLC 
statistic is defined as 

(1.4) An ^ 2 (max LNiOi,e2,fi) - LN{0,9,fi)) . 

\6i,02 J 

The term projected likelihood is used here to distinguish the procedure from profile 
likelihood. We call it projected likelihood because the profile of the nuisance param- 
eter is obtained after projecting the full likelihood onto /(-j^, 77) G A^o- That way 
we first obtain a projected profile of 77 and then maximize it so that its estimate 
coincides with the maximum likelihood estimate (MLE) under the null hypothesis. 
Note that this procedure, in spirit, is very similar to the Rao's score test. 

The paper is organized as follows. In Section 2, the large sample properties of 
the PLC statistics is discussed. In Section 3, some simulation studies are provided. 
The proof of the main theorem in Section 2 is provided in the Appendix. 



2. Large sample approximation of PLC statistic 

For the purpose of theoretical investigation we shall simplify the model further as- 
suming that the class of densities are all one dimensional. Denote the null hypothesis 

by 

(2.1) iJ^ : Zi,Z2,...,ZAr areiid A^o 

For notational convenience we adopt the convention that the symbol D"^^ indicates 
r-th partial derivative with respect to x, treated as a generic argument in a function. 
Define the following estimated scores 

(2.2) W.,) - 

for 1 < j < and ?- > 1. Analogously define the true scores ^r(i) = ^/(^'^e ^)''"' 
the true parameter values under H^. One can verify that E^, Cr(l) = for every 
r > 1 in case of regular parametric families. 

Note that under regularity assumptions on the model the scores are well behaved 
and have finite moments. For the Gaussian case all moments will be finite since the 
joint moment generating function of any finite set of polynomials involving ^^'s 
exists. Define the following mixed partial derivatives of the full likelihood Ljv. 

(2.3) C,, = {De,^D,,y{Ds,-De2YLN{6,.B,), 

— N T iV 

where i,j are nonnegative integers. Moreover, let — N~ C^. Although the 
quantities defined in (2.3) look quite incomprehensible they can however be ex- 
pressed as linear combinations of Dg^ Li\i{9i, 62) using the Binomial expansion. 
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One can establish with some effort the following. Dg^ log g{z\9^,9^,f]j^) = 

^j^* a(^^) ririi ^r'^{^)^ where ^* runs over all nonnegative integral partitions 
VL = (wi, CJ2, • ■ • , ^p+q) satisfying ^ r = « + J- The coefficients aiO) are compli- 
cated combinatorial quantities but can be recursively computed. It can be verified 

N 

that Cj^- = if j is odd. We provide simplified expressions for some of the lower 

N 

order C^j which are necessary for future calculations. 

= ^ (Uj) - iiU) ) (- -I), Co, = ^ Ef=i Uj) 

(2.4) 

= ^ Ef=i (6(j) - U3)Ui)). Col = w Ef=i V-Cj), 

where V'(j) = e4(.7) + 5?i(j)e3(j) - (j) + 3^V(j)6(j), and I is the Fisher 
information of d under i/g. Finally, let Cy denote asymptotic expected values of 
under 77q which can be easily derived using Lemma 2.1 (i). The distributional 

— N 

properties of C^j can be derived using classical properties of M-estimators. We 
state the following lemma for the sake of completeness. The proof can be found in 
Bickel and Doksum [2]. 

Lemma 2.1. Let Z^i, Z2, . . . , Zfq he independent and identically distributed random 
variables with density f{z\9) satisfying usual regularity conditions with the score 
function S{z,9) and Fisher information matrix X = Covg {S{Zi,9)). 

(i) Let ip{z,9) be a real valued, continuously differentiable (in 9) kernel with 
Eg 9) < 00, for every 9. Further let 9^ denote the MLE of 9. Then 

1 ^ 

V(^^,^iv)^ EeibiZi,9). 

1=1 

(ii) In addition if tp satisfies Eg 'ip{Zi, 9) — for every 9 then 

1 ^ 

(2.5) -= J2 ^(^- ^n) =^ N{0, V^), 

where = Eg^J^ - C'l-^C where C = Covg iij{Zi,9), S{Zi,9)). 

Finally, we proceed to the main asymptotic representation theorem of the PLC 
statistic. It turns out that even in the Gaussian case the standard ^^-approximation 
does not hold. Actually it turns out that Gaussian case is more paradoxical than one 
would expect. As a result one has to go for higher order expansion to get an idea of 
the limiting behavior of the statistic. The crucial issue is whether E^, ^i(l)'^2(l) = 
or not. This is a measure of some type of spurious non-degeneracy in the model 
due to skcwness and its asymptotic effect needs to be corrected for. Two cases are 
considered in the simulation section. In the first case we consider a mixture Gaussian 
with different means but common unknown variance and the in second case scale 
mixture Gaussian with common unknown mean is considered. In both cases we 
find E^^ 'Ci (1)^2(1) = 0. The first case is covered by Theorem 2.2(i) below while 

the second case is covered by Theorem 2.2(ii). We state the theorem keeping these 
two special cases in mind. The proof of the theorem is provided in the Appendix. 
Theorem 2.2. Assume that Eh* ^1(1)^2(1) = and C04 < 0. Then under Hq, 

— N P 

(i) i/Co2 = 0, then An ^ 0. 
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(ii) if C^2 =^ ^(0, cr^) for some > 0, then 
(2.6) Aat =^ c2niax(0,Z)^ 

for suitable > and a standard normal variate Z . 

3. Simulation studies in the case of Gaussian mixtures 

In this section wc provide results pertaining to the samphng distributions of the 
PLC statistic under the null in case of Gaussian mixtures [7]. Studies have been 
carried out for two different cases: unknown variances and common mean as the 
nuisance parameter and unknown means and common variance as the nuisance 
parameter. The simulation results are in conformity with the theoretical results 
derived. The power function of the PLC test statistic for each of the above two 
set-ups have been studied for different values of the alternative. Simulation studies 
have been carried out for different sample sizes. 

3.1. Null distributions of the PLC 

Consider the particular example of Gaussian mixture models, the main parame- 
ters of interest are the unknown means and the common variance is the nuisance 
parameter. The generating model is given by 

(3.1) /(z|0,r;)=77-V((^-e)A?) 

where 4> is the standard normal probabihty density function (0 G 3?, 77 > 0). In 
this case 77 = iV-i J2i=ii^i - where Z = N-^ J^iLi ^i- The corresponding 
PLC is denoted by A^. Simulation studies for the null distribution of have 
been performed for sample sizes A^=50, 100 and 200. Percentiles of the sampling 
distribution are displayed in Table 1 which shows how different percentiles p (5, 
50 and 95) of the null distribution of A^ decrease with increasing sample size N. 
The difference of the percentile values, (say that between percentiles 95 and 5), 
decreases with increasing sample size as well. The tabulated values give sufficient 
reason to believe in the validity of the theoretical results obtained in Theorem 2.2. 

In the second example, also pertaining Gaussian mixture models, the main pa- 
rameters of interest are unknown variances and the common mean is the nuisance 
parameter. 

(3.2) f{z\e,Tj) = e-'<j>{{z-ri)/e) 

for 6* > 0,77 G !R. Here fj = Z. The corresponding PLC statistic is denoted by A^. 

Table 1 

Percentiles of the null distribution of the PLC, corresponding to a 
Gaussian mixture with unknown means and common variance as 
the nuisance parameter 



Percentiles 


N 


5 


50 


95 


50 


0.008 


0.011 


0.014 


100 


0.004 


0.005 


0.006 


200 


0.002 


0.002 


0.003 
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The constant in the Umiting distribution (2.6) can be computed, but the 
computations are quite cumbersome. Hence the constant has been evaluated 
based on the samphng distribution of under the null. The sampling distribution 
is based on 5000 simulations of data-size 2000. The value of hence obtained is 
0.69070. 

The asymptotic null distribution of Aj^ is a mixture of a degenerate mass at 
and a c^Xi (for suitable > 0), with mixing proportion 0.5. The sampling distri- 
bution of A|^, obtained from 5000 simulations of sample size 2000, is found to be a 
mixture of outcomes which are exactly zero and another strictly positive absolutely 
continuous distribution. We have observed that this absolutely continuous distri- 
bution (as obtained from simulations) is very close to c^xi (where = 0.69070) 
as depicted in Figure 2. Hence simulation studies of the null distribution show 
sufRcient conformity to the theoretical results obtained in Theorem 2.2. 

Simulation studies for the null distribution of have been performed and 
tabulated (see Table 2) for different sample sizes N based on 1000 simulations of 
data size N where N = 50, 100, 200 . 

The expected value of the sampling distribution shows a negative bias. The 
degree to which it approximates the mean of the large sample distribution of the 
PLC improves with increasing sample size. The proportion of zeros in the sampling 



1.6 I r- 

\ 




Fig 2. Dotted line shows the kernel density estimate o/ c^(max{0, A'"(0, l)^})(c^ = 0.69070), 
the theoretical asymptotic null distribution of the PLC under N{0,1). Note that by invariance 
the results do not depend on the choice of the mean and variance. The solid line is the kernel 
density estimate of the sampling distribution of the PLC with the zeros left out, under the null 
corresponding to a Gaussian mixture of the same set-up. This sampling distribution is based on 
5000 simulations of sample size 2000. 



Table 2 

Summary statistics of the null distribution of the PLC, corresponding to a Gaussian mixture 
with unknown variance and common mean as the nuisance parameter 





Expectation 


% of 


zeros 


5% signif. point 


N 


Theor.* 


Est. 


Theor. 


Est. 


Theor.* Est. 


50 


0.345 


0.156 


50 


70.1 


1.86 0.935 


100 


0.345 


0.256 


50 


61.5 


1.86 1.608 


200 


0.345 


0.328 


50 


57.5 


1.86 1.817 



*The sampling distribution based on 5000 simulations of sample-size 2000, has been used as a 
proxy for the theoretical asymptotic null distribution. 
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Fig 3. Solid line, dotted line and dashed line correspond to the sample sizes 200, 100 and 50 
respectively in both the figures. Power functions of the PLC test statistic at level a = 0.05 have 
been evaluated. In the case o/A^, (left figure) the power function has been evaluated for values 

of the parameter ^^^'^^^ g [Q, 2] , The power function corresponding to (right figure) has also 
been evaluated for the values of the parameter w '^^j € [1> 3] . 



distribution goes on decreasing with before it asymptotes to the theoretical vahie 
0.5. The degree to which the samphng distribution approximates the theoretical 
distribution improves with increasing sample size in the case of the 95*'' percentile. 



3. 2. Power function of the PL C test statistic 



Power functions corresponding to the test statistic at level a = 0.05 have been 
evaluated for different values of the parameter (different values of the alternative) 
in the range [0,2], for three different sample sizes N = 50, 100, 200. (Fig- 
ure 3). The power is found to increase with increasing sample size. 

Power functions corresponding to the test statistic at level a = 0.05 have been 
evaluated for different values of the parameter (different values of the alternative) 

m fn{9i 92} ^'^^^^ t"^'^]' different sample sizes N = 50, 100, 200. 

(Figure 3). The power is found to increase with increasing sample size. 



Appendix: Proof of Theorem 2.2 



First, it follows from Chen et al. [4] that both the MLEs 6i and 62 respectively are 
N^^^ consistent under (1.1). For both the cases in the theorem we re-parametrize 
the problem with 61=6^ + N-^'^ s+N-^''^ t and 6*2 = ^„ + N-^/^ s- iV-i/4 ^ and 
study its behavior near {9^,9^^) in the neighborhoods \s\ < logiV and |r| < log TV 
respectively. In what follows we do not verify orders of remainder terms explicitly. 
Several technical steps need to be verified in the process of deriving the result. We 
refer to Bickel and Doksum [2] , Ghosh and Sen [G] and Bose and Sengupta [3] for the 
type of regularity assumptions and machinery needed for uniform approximations 
in such a context. Also, note that under the above parametrization the likelihood 
becomes an even function in r. Therefore we work with r > without any loss of 
generality. The asymptotic problem is non-standard because the Fisher information 
matrix, I{9i , ^2 7 ^) 1 has rank 2 if = ^2 and 3 otherwise (can be verified by 
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straightforward differentiation). Next define 

H{s, t) ^ {9^ +s + tJ^+s-t). 
It can be readily verified from (f .2) and (2.3) that 

(A.l) __i/(o,o) = C,,, 

for i, j > 0. The strategy of the proof is the following. Since the expansion is regular 
in within-model displacement s, we fix r > and maximize over s in the first step. 
Then, we examine the behavior of the maximum value obtained in the first step 
across r to derive the final result. Because of our general regularity conditions all 
the following calculations will be valid uniformly in probability over the compact 
set \s\ < logA^ and < r < logA^. In what follows 7 > shall denote a generic 
constant whose value may be determined on a case by case basis. Also, in deriving 
the orders of remainders we specially mention one simple fact from calculus, namely, 
iV"''(log A^)** as -> cx) for any a, 6 > 0. 

H{N-^/^s,t) = H{0,t) + s[N-^^^Hio{0,t)] 

(A.2) +1^2 [N-'H,o{0, r)] + (A^-^), 

where i/^ 's denote respective partial derivatives of H. Also, it can be checked that 

H2o{0,t) = -NI {l + o^ {N-'')). 

Therefore, in large samples, for fixed < r < N^^^^ logN, the maximum value 
of H{N-'^^^s,t) over the compact set \s\ < logA^ cannot exceed its unrestricted 
global maximum, which is of the order of [A^-i/2^io(0, r)]^ / [^^"^-^20(0, r)]. By 
direct Taylor series of order 4 we find 

iJio(0, A^-i/V) = (2!)-i [^C"^y + (4!)-i [c"^]t^ + op(Af"^). 

The facts required for the above simplification are: (i) i/io(0,0) = by the maxi- 
mum likelihood equation, (ii) iJij(0,0) = for j odd (since H is an even function 
of r) and (iii) the assumption of the theorem that -Eh* (1)^2(1) = 0. ft can be 

checked that the last assertion implies \/NCi2 = Op(l), in view of (2.4) and Lemma 
2.1. 

Therefore by virtue of the assumptions of the theorem the profile global maxi- 
mum of H[-,t) becomes negligible in probability over the range of interest. Thus 
we have 

(A.3) max H{N^^/^s,t) = H{0,t) + o„ (N'^), 

|s|<log JV 

uniformly over < r < Af-i/4 logiV. Finally, 

HiO, N-''\) = H{0, 0) + (2!)-i [VnCo,] + (4!)'i [Co" ] + o, (A^"^). 
Therefore we have 

Aat w 2 max [iJ(Af-i/2s, Af-^/V) - iJ(0, 0)1 

|s|<logJV,0<T<logAf 

(A.4) 

= niax { [VAC'o';] + ^ \cl^\ + o, {N-^)}. 

0<r<log A'^ 
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Now we consider case (i) of the theorem where = 0. Then (A. 4) reduces to 

— N t 

A^v — maxo<r<iogAr{(l/12) [Cq4] + op{N~~')}. Since Co4 < it foUows form 
Lemma 2.1 that Pr{C'o4 < —5} 1 for arbitrarily small (5 > 0. By choosing 
T > 12^/^ ^-1/4^-7/4 Qjjg (jg^.^ t^Y^QTff that the value of the objective function (being 

maximized) becomes negative. Hence it can be verified that A^r — > 0. 

For case (ii) arguing in a similar line and collecting the dominant terms from 
(A. 3) and (A. 4) and then maximizing the dominant term with respect to r (noting 

that the dominant expression is a quadratic in and Cf^^ ^ C04 (< 0)) we obtain 
An « niax { [VNC02] + [C04] r^} 

0<T<iog N 

(A.5) 

_o [max(0,VlVC(^)] ^ 

with an error in approximation of the order of {N~'') as before. Hence the second 
part of the the theorem follows from the assumptions. 
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